1. Descriptive Analysis¶

In [1]:
import pandas as pd
import datetime
import os
os.chdir('C:\\Users\\toramanse\\Downloads')
In [2]:
df = pd.read_csv('all_ticks_long.csv')
df.info()
df.isnull().sum()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2848030 entries, 0 to 2848029
Data columns (total 3 columns):
 #   Column      Dtype  
---  ------      -----  
 0   short_name  object 
 1   timestamp   object 
 2   price       float64
dtypes: float64(1), object(2)
memory usage: 65.2+ MB
Out[2]:
short_name    0
timestamp     0
price         0
dtype: int64

As timestamp is object, timestamp with datetime type is to be added as timestamp_dt.

In [3]:
df['timestamp_dt'] = pd.to_datetime(df['timestamp'])
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2848030 entries, 0 to 2848029
Data columns (total 4 columns):
 #   Column        Dtype              
---  ------        -----              
 0   short_name    object             
 1   timestamp     object             
 2   price         float64            
 3   timestamp_dt  datetime64[ns, UTC]
dtypes: datetime64[ns, UTC](1), float64(1), object(2)
memory usage: 86.9+ MB
In [4]:
df.describe(include = 'all')
Out[4]:
short_name timestamp price timestamp_dt
count 2848030 2848030 2.848030e+06 2848030
unique 60 50012 NaN NaN
top GARAN 2018-03-28T14:45:00Z NaN NaN
freq 49308 60 NaN NaN
mean NaN NaN 9.282380e+00 2016-04-26 08:05:50.160145664+00:00
min NaN NaN 0.000000e+00 2012-09-17 06:45:00+00:00
25% NaN NaN 1.764200e+00 2014-08-28 07:00:00+00:00
50% NaN NaN 4.427600e+00 2016-06-02 09:45:00+00:00
75% NaN NaN 9.280000e+00 2018-01-05 07:15:00+00:00
max NaN NaN 1.394288e+02 2019-07-23 15:00:00+00:00
std NaN NaN 1.555997e+01 NaN

There are 60 stocks.\ Data have been collected between 2012-09-17 and 2019-07-23.\ Minimum of price requires to be more discovered more.

In [5]:
df_summ_byName = df.groupby(['short_name'])['price'].describe().sort_values(by = 'count', ascending = False)
print(df_summ_byName)
              count       mean        std     min       25%      50%  \
short_name                                                             
GARAN       49308.0   7.899734   1.249637  0.0001   7.01540   7.6542   
THYAO       49282.0   9.288821   4.027293  0.0001   6.43000   7.7800   
YKBNK       49225.0   2.566327   0.422774  0.0001   2.26820   2.6093   
ISCTR       49221.0   5.126551   1.003386  0.0001   4.32000   4.8543   
VAKBN       49212.0   4.735438   0.977889  0.0001   4.03220   4.4742   
AKBNK       49209.0   6.473105   0.944955  0.0001   5.85000   6.3057   
PETKM       49184.0   2.539237   1.378510  0.0001   1.28690   2.2845   
EREGL       49173.0   4.179544   2.690731  0.0001   2.18120   3.0360   
KRDMD       49161.0   1.768390   0.940092  0.0001   1.08450   1.3979   
TUPRS       49143.0  62.994535  32.398117  0.0001  34.54910  49.5542   
TCELL       49143.0   9.828003   2.356250  0.0001   8.56630   9.7001   
SAHOL       49095.0   8.615896   0.955310  0.0001   7.96520   8.6079   
KCHOL       49093.0  12.248291   3.181444  0.0001   9.73680  12.0449   
SISE        49090.0   3.048367   1.422848  0.0001   1.92200   2.6682   
TTKOM       49077.0   5.660680   0.818598  0.0001   5.26730   5.7464   
HALKB       49071.0  10.919353   3.071563  0.0001   8.72050  10.6531   
GUBRF       49057.0   4.328323   1.222988  0.0001   3.27650   4.2500   
ARCLK       49045.0  15.388088   4.531459  0.0001  11.71110  15.0100   
FROTO       48995.0  32.763693  14.732664  0.0001  21.49380  27.1182   
GOODY       48961.0   3.102485   0.886456  0.0001   2.42770   3.1920   
TOASO       48946.0  16.597275   6.328241  0.0001  10.36560  16.5554   
BRISA       48937.0   6.544896   1.295321  0.0001   5.89000   6.7300   
TKFEN       48930.0   9.191809   6.667047  0.0001   4.31900   5.7532   
MGROS       48903.0  19.576424   3.901269  0.0001  16.66000  19.1100   
TRKCM       48886.0   2.027849   1.099667  0.0001   1.17420   1.6270   
ZOREN       48807.0   1.248124   0.311330  0.0001   1.03380   1.2500   
ASELS       48803.0  13.432535   9.624246  0.0001   4.94030   9.2757   
OTKAR       48785.0  81.419528  27.782825  0.0001  56.77570  82.8224   
VESTL       48781.0   5.942711   2.830465  0.0000   4.02000   6.3200   
CCOLA       48749.0  36.890707   6.747213  0.0001  31.97820  34.8215   
BAGFS       48650.0  10.407127   3.618058  0.0001   8.26175  10.6100   
AKSA        48594.0   7.127504   2.710033  0.0001   5.20880   6.9853   
KARSN       48527.0   1.326907   0.290413  0.0001   1.11000   1.2874   
ECILC       48492.0   2.075865   0.973788  0.0001   1.17230   1.8214   
PRKME       48466.0   2.927109   0.721949  0.0001   2.38950   2.7400   
ASUZU       48433.0   6.467033   2.201036  0.0001   5.07480   5.9496   
TSKB        48384.0   0.945233   0.155276  0.0001   0.82540   0.9373   
ALARK       48335.0   2.060859   0.575943  0.0001   1.56890   1.9376   
SODA        48276.0   3.189591   2.045764  0.0001   1.47580   2.6684   
AKSEN       48171.0   3.183542   0.724332  0.0000   2.67000   2.9300   
ANACM       48165.0   1.672102   0.788365  0.0001   1.04700   1.2597   
AEFES       48131.0  20.982235   2.494002  0.0001  19.16050  20.6465   
AYGAZ       48119.0   8.101948   2.610402  0.0001   5.95150   7.7238   
BANVT       47951.0   7.628230   6.267278  0.0000   2.59000   3.7100   
USAK        47659.0   1.220452   0.459532  0.0001   0.95710   1.0500   
SASA        47633.0   2.294876   2.492934  0.0001   0.31920   0.7335   
KRDMB       47532.0   2.222798   0.686385  0.0001   1.56120   2.2007   
SKBNK       47270.0   1.473651   0.294908  0.0001   1.20000   1.5100   
ALBRK       46862.0   1.365549   0.167824  1.0255   1.22510   1.3602   
CEMAS       46394.0   1.209088   0.799981  0.0000   0.70000   0.8700   
YATAS       46055.0   2.434249   2.552377  0.0001   0.38860   0.9658   
KAREL       46032.0   3.178023   2.133619  0.0001   1.53130   1.8200   
TUKAS       45929.0   1.737529   0.867095  0.6500   1.06000   1.5300   
YUNSA       45528.0   4.079695   1.347020  0.0001   3.00670   4.1078   
PARSN       45325.0   8.276989   4.662471  0.0000   4.57000   7.8900   
PGSUS       45221.0  24.789487   7.656535  0.0000  17.79000  25.6400   
ICBCT       44336.0   2.828502   1.789883  0.0000   1.55960   2.0300   
ISYAT       43184.0   0.537338   0.160246  0.0001   0.44120   0.4957   
ISFIN       42877.0   1.559420   1.764839  0.0001   0.56390   0.8635   
ISDMR       12227.0   5.351663   1.697918  1.0181   4.85420   5.9063   

                  75%       max  
short_name                       
GARAN         8.67860   12.1554  
THYAO        12.27000   19.9500  
YKBNK         2.87400    3.9581  
ISCTR         5.82030    7.9639  
VAKBN         5.24600    7.5814  
AKBNK         6.93250    9.2124  
PETKM         3.88280    5.7697  
EREGL         6.75870   10.4710  
KRDMD         2.16900    4.9510  
TUPRS        93.42870  139.2937  
TCELL        11.23640   15.8125  
SAHOL         9.26820   11.6826  
KCHOL        15.16930   19.1500  
SISE          4.14600    6.9230  
TTKOM         6.26000    7.3500  
HALKB        13.49090   20.2365  
GUBRF         5.13000   13.6191  
ARCLK        19.08770   26.4278  
FROTO        48.51160   65.4192  
GOODY         3.59660   58.7574  
TOASO        20.65130   29.9218  
BRISA         7.33000   10.3275  
TKFEN        14.24675   27.3200  
MGROS        22.10000   30.2600  
TRKCM         2.98260    4.6432  
ZOREN         1.42650    2.4430  
ASELS        22.75670   46.7616  
OTKAR       105.49880  139.4288  
VESTL         7.45000   14.5400  
CCOLA        42.04970   54.2208  
BAGFS        12.35000   38.4352  
AKSA          8.72000   15.1189  
KARSN         1.47000    2.5000  
ECILC         2.78090    4.2278  
PRKME         3.43650    5.4300  
ASUZU         7.12000   15.2800  
TSKB          1.02440    1.4208  
ALARK         2.42140    3.5143  
SODA          4.28610    7.7659  
AKSEN         3.75000    5.1900  
ANACM         2.40210    3.5021  
AEFES        22.73200   28.5090  
AYGAZ        10.26900   13.5935  
BANVT        11.93000   28.6800  
USAK          1.37080    2.7578  
SASA          4.94730    8.4260  
KRDMB         2.72730    4.4960  
SKBNK         1.72070    2.2516  
ALBRK         1.50000    2.1900  
CEMAS         1.50000    7.0100  
YATAS         4.23000   10.6748  
KAREL         5.25000    9.4600  
TUKAS         2.13000    5.9200  
YUNSA         4.72060    9.5275  
PARSN        10.65000   29.8200  
PGSUS        29.44000   50.6500  
ICBCT         4.07000   11.2700  
ISYAT         0.63330    1.1500  
ISFIN         1.67420    9.8300  
ISDMR         6.56070    7.5936  

It seems that minimum value is an issue. Values of 0.0000 and 0.0001 must be checked.\ As stocks have different number of counts, wide format will have got null values. Null values will be handled once wide data is to be used for PCA.

In [6]:
chk_1 = (df['short_name'] == 'GARAN') & (df['price'] == 0.0001)
print(df[chk_1])
       short_name             timestamp   price              timestamp_dt
923352      GARAN  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00

Let's see what happened to GARAN on the same and following days.

In [7]:
print(df.loc[923351:923353])
       short_name             timestamp   price              timestamp_dt
923351      GARAN  2013-05-06T14:30:00Z  8.8245 2013-05-06 14:30:00+00:00
923352      GARAN  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
923353      GARAN  2013-05-07T06:30:00Z  9.3782 2013-05-07 06:30:00+00:00

Since the values before/after 0.0001 are nowhere nearby, it is obviously an error. Let's see all problematic rows. These values may be removed from the dataset.

In [8]:
chk_2 = (df['price'] <= 0.0001)
print(len(df[chk_2]))
print(df[chk_2])
57
        short_name             timestamp   price              timestamp_dt
3677         AEFES  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
52255        AKBNK  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
101192        AKSA  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
149396       AKSEN  2013-05-06T16:45:00Z  0.0000 2013-05-06 16:45:00+00:00
197755       ALARK  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
292904       ANACM  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
341494       ARCLK  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
390625       ASELS  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
439340       ASUZU  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
487598       AYGAZ  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
535755       BAGFS  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
584341       BANVT  2013-05-06T16:45:00Z  0.0000 2013-05-06 16:45:00+00:00
632516       BRISA  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
681225       CCOLA  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
730061       CEMAS  2013-05-06T16:45:00Z  0.0000 2013-05-06 16:45:00+00:00
776443       ECILC  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
825137       EREGL  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
874182       FROTO  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
923352       GARAN  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
972535       GOODY  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
1021506      GUBRF  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
1070479      HALKB  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
1119276      ICBCT  2013-05-06T16:45:00Z  0.0000 2013-05-06 16:45:00+00:00
1164077      ISCTR  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
1224868      ISFIN  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
1267470      ISYAT  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
1310861      KAREL  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
1357097      KARSN  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
1406040      KCHOL  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
1454773      KRDMB  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
1502645      KRDMD  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
1551768      MGROS  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
1600490      OTKAR  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
1648851      PARSN  2013-05-06T16:45:00Z  0.0000 2013-05-06 16:45:00+00:00
1694902      PETKM  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
1740153      PGSUS  2013-05-06T16:45:00Z  0.0000 2013-05-06 16:45:00+00:00
1788960      PRKME  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
1837687      SAHOL  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
1886501       SASA  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
1934464       SISE  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
1983221      SKBNK  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
2030487       SODA  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
2079107      TCELL  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
2128293      THYAO  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
2177326      TKFEN  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
2226320      TOASO  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
2275288      TRKCM  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
2323912       TSKB  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
2372619      TTKOM  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
2467658      TUPRS  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
2516586       USAK  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
2564510      VAKBN  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
2613379      VESTL  2013-05-06T16:45:00Z  0.0000 2013-05-06 16:45:00+00:00
2661987      YATAS  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
2708568      YKBNK  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
2756506      YUNSA  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00
2803110      ZOREN  2013-05-06T16:45:00Z  0.0001 2013-05-06 16:45:00+00:00

These incorrect updates are to be removed. Summary of the data will be updated after removing so that it can be observed whether the error continues with some other so small values.

In [9]:
df = df[df['price'] > 0.0001]
df_summ_byName = df.groupby(['short_name'])['price'].describe().sort_values(by = 'count', ascending = False)
print(df_summ_byName)
              count       mean        std      min        25%       50%  \
short_name                                                                
GARAN       49307.0   7.899894   1.249143   5.0936   7.015400   7.65420   
THYAO       49281.0   9.289010   4.027117   3.2356   6.430000   7.78000   
YKBNK       49224.0   2.566379   0.422620   1.5300   2.268200   2.60930   
ISCTR       49220.0   5.126655   1.003131   3.4441   4.320000   4.85430   
VAKBN       49211.0   4.735534   0.977666   3.0000   4.032200   4.47420   
AKBNK       49208.0   6.473236   0.944514   4.3227   5.850000   6.30570   
PETKM       49183.0   2.539289   1.378477   0.7379   1.286900   2.28450   
EREGL       49172.0   4.179629   2.690693   0.7289   2.181200   3.03600   
KRDMD       49160.0   1.768426   0.940068   0.7042   1.084500   1.39790   
TUPRS       49142.0  62.995817  32.397201  24.1844  34.549100  49.55420   
TCELL       49142.0   9.828203   2.355857   4.3504   8.566300   9.70010   
SAHOL       49094.0   8.616071   0.954528   6.4824   7.965200   8.60790   
KCHOL       49092.0  12.248541   3.180996   5.9145   9.736800  12.04490   
SISE        49089.0   3.048430   1.422796   0.8881   1.922000   2.66820   
TTKOM       49076.0   5.660795   0.818208   3.1900   5.267300   5.74640   
HALKB       49070.0  10.919575   3.071199   4.9100   8.720500  10.65450   
GUBRF       49056.0   4.328412   1.222844   2.2561   3.279125   4.25000   
ARCLK       49044.0  15.388402   4.530973   6.0608  11.711100  15.01500   
FROTO       48994.0  32.764362  14.732071  10.9305  21.493800  27.11820   
GOODY       48960.0   3.102549   0.886355   1.5019   2.427700   3.19245   
TOASO       48945.0  16.597614   6.327861   5.0917  10.365600  16.55540   
BRISA       48936.0   6.545030   1.294996   3.2433   5.890000   6.73000   
TKFEN       48929.0   9.191996   6.666985   3.0316   4.319000   5.75320   
MGROS       48902.0  19.576824   3.900305  11.2500  16.660000  19.11000   
TRKCM       48885.0   2.027891   1.099640   0.4006   1.174200   1.62700   
ZOREN       48806.0   1.248150   0.311282   0.6302   1.033800   1.25000   
ASELS       48802.0  13.432811   9.624152   2.9009   4.940300   9.27570   
OTKAR       48784.0  81.421197  27.780664  24.1387  56.775700  82.82240   
VESTL       48780.0   5.942833   2.830366   1.3300   4.020000   6.32000   
CCOLA       48748.0  36.891464   6.745213  23.7252  31.978200  34.82150   
BAGFS       48649.0  10.407341   3.617788   4.6300   8.267000  10.61000   
AKSA        48593.0   7.127651   2.709868   1.5268   5.208800   6.98530   
KARSN       48526.0   1.326934   0.290354   0.7862   1.110000   1.28740   
ECILC       48491.0   2.075908   0.973753   0.7669   1.172300   1.82140   
PRKME       48465.0   2.927169   0.721834   1.8300   2.389500   2.74000   
ASUZU       48432.0   6.467167   2.200862   2.3277   5.074800   5.94960   
TSKB        48383.0   0.945252   0.155218   0.6400   0.825400   0.93730   
ALARK       48334.0   2.060902   0.575873   1.1880   1.568900   1.93760   
SODA        48275.0   3.189657   2.045734   0.2447   1.475800   2.66840   
AKSEN       48170.0   3.183608   0.724194   2.0100   2.670000   2.93000   
ANACM       48164.0   1.672136   0.788336   0.7514   1.047000   1.25970   
AEFES       48130.0  20.982671   2.492193  13.4782  19.160500  20.64650   
AYGAZ       48118.0   8.102116   2.610168   3.2193   5.951500   7.72380   
BANVT       47950.0   7.628389   6.267246   1.7800   2.590000   3.71000   
USAK        47658.0   1.220477   0.459503   0.5353   0.957100   1.05000   
SASA        47632.0   2.294924   2.492938   0.1820   0.319200   0.73350   
KRDMB       47531.0   2.222844   0.686316   1.1997   1.561200   2.20070   
SKBNK       47269.0   1.473682   0.294833   0.8300   1.200000   1.51000   
ALBRK       46862.0   1.365549   0.167824   1.0255   1.225100   1.36020   
CEMAS       46393.0   1.209114   0.799970   0.5000   0.700000   0.87000   
YATAS       46054.0   2.434302   2.552380   0.2029   0.388600   0.96580   
KAREL       46031.0   3.178092   2.133591   1.1200   1.531300   1.82000   
TUKAS       45929.0   1.737529   0.867095   0.6500   1.060000   1.53000   
YUNSA       45527.0   4.079785   1.346899   2.0874   3.006700   4.10780   
PARSN       45324.0   8.277171   4.662360   2.2500   4.570000   7.89000   
PGSUS       45220.0  24.790035   7.655733  11.0900  17.790000  25.64000   
ICBCT       44335.0   2.828566   1.789853   0.6800   1.559600   2.03000   
ISYAT       43183.0   0.537350   0.160227   0.1984   0.441200   0.49570   
ISFIN       42876.0   1.559456   1.764844   0.2369   0.563900   0.86350   
ISDMR       12227.0   5.351663   1.697918   1.0181   4.854200   5.90630   

                   75%       max  
short_name                        
GARAN         8.678600   12.1554  
THYAO        12.270000   19.9500  
YKBNK         2.874000    3.9581  
ISCTR         5.820300    7.9639  
VAKBN         5.246000    7.5814  
AKBNK         6.932500    9.2124  
PETKM         3.882800    5.7697  
EREGL         6.758700   10.4710  
KRDMD         2.169000    4.9510  
TUPRS        93.428700  139.2937  
TCELL        11.236400   15.8125  
SAHOL         9.268600   11.6826  
KCHOL        15.169300   19.1500  
SISE          4.146000    6.9230  
TTKOM         6.260000    7.3500  
HALKB        13.490900   20.2365  
GUBRF         5.130000   13.6191  
ARCLK        19.087700   26.4278  
FROTO        48.511600   65.4192  
GOODY         3.596600   58.7574  
TOASO        20.651300   29.9218  
BRISA         7.330000   10.3275  
TKFEN        14.248900   27.3200  
MGROS        22.100000   30.2600  
TRKCM         2.982600    4.6432  
ZOREN         1.426500    2.4430  
ASELS        22.756700   46.7616  
OTKAR       105.498800  139.4288  
VESTL         7.450000   14.5400  
CCOLA        42.056525   54.2208  
BAGFS        12.350000   38.4352  
AKSA          8.720000   15.1189  
KARSN         1.470000    2.5000  
ECILC         2.780900    4.2278  
PRKME         3.436500    5.4300  
ASUZU         7.120000   15.2800  
TSKB          1.024400    1.4208  
ALARK         2.421400    3.5143  
SODA          4.286100    7.7659  
AKSEN         3.750000    5.1900  
ANACM         2.402100    3.5021  
AEFES        22.732000   28.5090  
AYGAZ        10.269000   13.5935  
BANVT        11.930000   28.6800  
USAK          1.370800    2.7578  
SASA          4.947300    8.4260  
KRDMB         2.727300    4.4960  
SKBNK         1.720700    2.2516  
ALBRK         1.500000    2.1900  
CEMAS         1.500000    7.0100  
YATAS         4.230000   10.6748  
KAREL         5.250000    9.4600  
TUKAS         2.130000    5.9200  
YUNSA         4.720600    9.5275  
PARSN        10.650000   29.8200  
PGSUS        29.440000   50.6500  
ICBCT         4.070000   11.2700  
ISYAT         0.633300    1.1500  
ISFIN         1.674200    9.8300  
ISDMR         6.560700    7.5936  

Let's see the plot of time series to check if any unusual fluctuation exists.

In [62]:
import matplotlib.pyplot as plt
nrow=10
ncol=6
fig, axes = plt.subplots(nrow, ncol, figsize = (30, 40))
cnt=0
for r in range(nrow):
    for c in range(ncol):
        df.groupby(df[df['short_name'] == df_summ_byName.index[cnt]]['timestamp_dt'])['price'].mean().plot(ax=axes[r,c])
        axes[r,c].set_ylabel(df_summ_byName.index[cnt])
        cnt+=1

GUBRF has an unusual jump at the end of 2014.\ GOODY has an unusual "era" in the middle of 2016.\ Let's analyse these stocks and relevant dates.\ Maximum values from summary table would benefit to spot the date. GUBRF: 13.6191 & GOODY: 58.7574

In [201]:
chk_3 = (df['short_name'] == 'GUBRF') & (df['price'] == 13.6191)
chk_4 = (df['short_name'] == 'GOODY') & (df['price'] == 58.7574)
print(df[chk_3])
print(df[chk_4])
        short_name             timestamp    price              timestamp_dt
1025255      GUBRF  2013-12-05T15:30:00Z  13.6191 2013-12-05 15:30:00+00:00
       short_name             timestamp    price              timestamp_dt
987040      GOODY  2015-07-15T12:00:00Z  58.7574 2015-07-15 12:00:00+00:00

Let's check the values before/after.

In [202]:
print(df.loc[1025254:1025256])
print(df.loc[987039:987041])
        short_name             timestamp    price              timestamp_dt
1025254      GUBRF  2013-12-05T15:15:00Z   3.4025 2013-12-05 15:15:00+00:00
1025255      GUBRF  2013-12-05T15:30:00Z  13.6191 2013-12-05 15:30:00+00:00
1025256      GUBRF  2013-12-06T07:30:00Z   3.3933 2013-12-06 07:30:00+00:00
       short_name             timestamp    price              timestamp_dt
987039      GOODY  2015-07-15T11:45:00Z   2.4886 2015-07-15 11:45:00+00:00
987040      GOODY  2015-07-15T12:00:00Z  58.7574 2015-07-15 12:00:00+00:00
987041      GOODY  2015-07-15T12:15:00Z   2.4808 2015-07-15 12:15:00+00:00

Mentioned maximum values are likely to be errors. They will be removed in the next step and then summary table will be updated.

In [203]:
df.drop(1025255, inplace = True)
df.drop(987040, inplace = True)
In [204]:
df_summ_byName = df.groupby(['short_name'])['price'].describe().sort_values(by = 'count', ascending = False)
print(df_summ_byName)
              count       mean        std      min       25%      50%  \
short_name                                                              
GARAN       49307.0   7.899894   1.249143   5.0936   7.01540   7.6542   
THYAO       49281.0   9.289010   4.027117   3.2356   6.43000   7.7800   
YKBNK       49224.0   2.566379   0.422620   1.5300   2.26820   2.6093   
ISCTR       49220.0   5.126655   1.003131   3.4441   4.32000   4.8543   
VAKBN       49211.0   4.735534   0.977666   3.0000   4.03220   4.4742   
AKBNK       49208.0   6.473236   0.944514   4.3227   5.85000   6.3057   
PETKM       49183.0   2.539289   1.378477   0.7379   1.28690   2.2845   
EREGL       49172.0   4.179629   2.690693   0.7289   2.18120   3.0360   
KRDMD       49160.0   1.768426   0.940068   0.7042   1.08450   1.3979   
TUPRS       49142.0  62.995817  32.397201  24.1844  34.54910  49.5542   
TCELL       49142.0   9.828203   2.355857   4.3504   8.56630   9.7001   
SAHOL       49094.0   8.616071   0.954528   6.4824   7.96520   8.6079   
KCHOL       49092.0  12.248541   3.180996   5.9145   9.73680  12.0449   
SISE        49089.0   3.048430   1.422796   0.8881   1.92200   2.6682   
TTKOM       49076.0   5.660795   0.818208   3.1900   5.26730   5.7464   
HALKB       49070.0  10.919575   3.071199   4.9100   8.72050  10.6545   
GUBRF       49055.0   4.328222   1.222137   2.2561   3.27825   4.2500   
ARCLK       49044.0  15.388402   4.530973   6.0608  11.71110  15.0150   
FROTO       48994.0  32.764362  14.732071  10.9305  21.49380  27.1182   
GOODY       48959.0   3.101412   0.849924   1.5019   2.42770   3.1920   
TOASO       48945.0  16.597614   6.327861   5.0917  10.36560  16.5554   
BRISA       48936.0   6.545030   1.294996   3.2433   5.89000   6.7300   
TKFEN       48929.0   9.191996   6.666985   3.0316   4.31900   5.7532   
MGROS       48902.0  19.576824   3.900305  11.2500  16.66000  19.1100   
TRKCM       48885.0   2.027891   1.099640   0.4006   1.17420   1.6270   
ZOREN       48806.0   1.248150   0.311282   0.6302   1.03380   1.2500   
ASELS       48802.0  13.432811   9.624152   2.9009   4.94030   9.2757   
OTKAR       48784.0  81.421197  27.780664  24.1387  56.77570  82.8224   
VESTL       48780.0   5.942833   2.830366   1.3300   4.02000   6.3200   
CCOLA       48748.0  36.891464   6.745213  23.7252  31.97820  34.8215   
BAGFS       48649.0  10.407341   3.617788   4.6300   8.26700  10.6100   
AKSA        48593.0   7.127651   2.709868   1.5268   5.20880   6.9853   
KARSN       48526.0   1.326934   0.290354   0.7862   1.11000   1.2874   
ECILC       48491.0   2.075908   0.973753   0.7669   1.17230   1.8214   
PRKME       48465.0   2.927169   0.721834   1.8300   2.38950   2.7400   
ASUZU       48432.0   6.467167   2.200862   2.3277   5.07480   5.9496   
TSKB        48383.0   0.945252   0.155218   0.6400   0.82540   0.9373   
ALARK       48334.0   2.060902   0.575873   1.1880   1.56890   1.9376   
SODA        48275.0   3.189657   2.045734   0.2447   1.47580   2.6684   
AKSEN       48170.0   3.183608   0.724194   2.0100   2.67000   2.9300   
ANACM       48164.0   1.672136   0.788336   0.7514   1.04700   1.2597   
AEFES       48130.0  20.982671   2.492193  13.4782  19.16050  20.6465   
AYGAZ       48118.0   8.102116   2.610168   3.2193   5.95150   7.7238   
BANVT       47950.0   7.628389   6.267246   1.7800   2.59000   3.7100   
USAK        47658.0   1.220477   0.459503   0.5353   0.95710   1.0500   
SASA        47632.0   2.294924   2.492938   0.1820   0.31920   0.7335   
KRDMB       47531.0   2.222844   0.686316   1.1997   1.56120   2.2007   
SKBNK       47269.0   1.473682   0.294833   0.8300   1.20000   1.5100   
ALBRK       46862.0   1.365549   0.167824   1.0255   1.22510   1.3602   
CEMAS       46393.0   1.209114   0.799970   0.5000   0.70000   0.8700   
YATAS       46054.0   2.434302   2.552380   0.2029   0.38860   0.9658   
KAREL       46031.0   3.178092   2.133591   1.1200   1.53130   1.8200   
TUKAS       45929.0   1.737529   0.867095   0.6500   1.06000   1.5300   
YUNSA       45527.0   4.079785   1.346899   2.0874   3.00670   4.1078   
PARSN       45324.0   8.277171   4.662360   2.2500   4.57000   7.8900   
PGSUS       45220.0  24.790035   7.655733  11.0900  17.79000  25.6400   
ICBCT       44335.0   2.828566   1.789853   0.6800   1.55960   2.0300   
ISYAT       43183.0   0.537350   0.160227   0.1984   0.44120   0.4957   
ISFIN       42876.0   1.559456   1.764844   0.2369   0.56390   0.8635   
ISDMR       12227.0   5.351663   1.697918   1.0181   4.85420   5.9063   

                   75%       max  
short_name                        
GARAN         8.678600   12.1554  
THYAO        12.270000   19.9500  
YKBNK         2.874000    3.9581  
ISCTR         5.820300    7.9639  
VAKBN         5.246000    7.5814  
AKBNK         6.932500    9.2124  
PETKM         3.882800    5.7697  
EREGL         6.758700   10.4710  
KRDMD         2.169000    4.9510  
TUPRS        93.428700  139.2937  
TCELL        11.236400   15.8125  
SAHOL         9.268600   11.6826  
KCHOL        15.169300   19.1500  
SISE          4.146000    6.9230  
TTKOM         6.260000    7.3500  
HALKB        13.490900   20.2365  
GUBRF         5.130000    8.6743  
ARCLK        19.087700   26.4278  
FROTO        48.511600   65.4192  
GOODY         3.596600   58.2892  
TOASO        20.651300   29.9218  
BRISA         7.330000   10.3275  
TKFEN        14.248900   27.3200  
MGROS        22.100000   30.2600  
TRKCM         2.982600    4.6432  
ZOREN         1.426500    2.4430  
ASELS        22.756700   46.7616  
OTKAR       105.498800  139.4288  
VESTL         7.450000   14.5400  
CCOLA        42.056525   54.2208  
BAGFS        12.350000   38.4352  
AKSA          8.720000   15.1189  
KARSN         1.470000    2.5000  
ECILC         2.780900    4.2278  
PRKME         3.436500    5.4300  
ASUZU         7.120000   15.2800  
TSKB          1.024400    1.4208  
ALARK         2.421400    3.5143  
SODA          4.286100    7.7659  
AKSEN         3.750000    5.1900  
ANACM         2.402100    3.5021  
AEFES        22.732000   28.5090  
AYGAZ        10.269000   13.5935  
BANVT        11.930000   28.6800  
USAK          1.370800    2.7578  
SASA          4.947300    8.4260  
KRDMB         2.727300    4.4960  
SKBNK         1.720700    2.2516  
ALBRK         1.500000    2.1900  
CEMAS         1.500000    7.0100  
YATAS         4.230000   10.6748  
KAREL         5.250000    9.4600  
TUKAS         2.130000    5.9200  
YUNSA         4.720600    9.5275  
PARSN        10.650000   29.8200  
PGSUS        29.440000   50.6500  
ICBCT         4.070000   11.2700  
ISYAT         0.633300    1.1500  
ISFIN         1.674200    9.8300  
ISDMR         6.560700    7.5936  

The comparison between mean and median implies that most of the distributions are skewed. Let's see KDEs with diverse bandwidths.

In [67]:
nrow=10
ncol=6
fig, axes = plt.subplots(nrow, ncol, figsize = (30, 40))
cnt=0
for r in range(nrow):
    for c in range(ncol):
        df[df['short_name'] == df_summ_byName.index[cnt]]['price'].plot.kde(ax=axes[r,c])
        df[df['short_name'] == df_summ_byName.index[cnt]]['price'].plot.kde(bw_method=.3, ax=axes[r,c])
        df[df['short_name'] == df_summ_byName.index[cnt]]['price'].plot.kde(bw_method=.5, ax=axes[r,c])
        df[df['short_name'] == df_summ_byName.index[cnt]]['price'].plot.kde(bw_method=1, ax=axes[r,c])
        axes[r,c].set_ylabel(df_summ_byName.index[cnt])
        cnt+=1

Most of the distributions is not normally distributed. That is why, it would be better to use median for central tendency if ever needed.

2. Moving Window Correlation¶

Dataset should be transformed to wide format.

In [69]:
df_wide = pd.pivot(df,index = 'timestamp_dt', columns = 'short_name', values = 'price')
df_wide.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 50011 entries, 2012-09-17 06:45:00+00:00 to 2019-07-23 15:00:00+00:00
Data columns (total 60 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   AEFES   48130 non-null  float64
 1   AKBNK   49208 non-null  float64
 2   AKSA    48593 non-null  float64
 3   AKSEN   48170 non-null  float64
 4   ALARK   48334 non-null  float64
 5   ALBRK   46862 non-null  float64
 6   ANACM   48164 non-null  float64
 7   ARCLK   49044 non-null  float64
 8   ASELS   48802 non-null  float64
 9   ASUZU   48432 non-null  float64
 10  AYGAZ   48118 non-null  float64
 11  BAGFS   48649 non-null  float64
 12  BANVT   47950 non-null  float64
 13  BRISA   48936 non-null  float64
 14  CCOLA   48748 non-null  float64
 15  CEMAS   46393 non-null  float64
 16  ECILC   48491 non-null  float64
 17  EREGL   49172 non-null  float64
 18  FROTO   48994 non-null  float64
 19  GARAN   49307 non-null  float64
 20  GOODY   48960 non-null  float64
 21  GUBRF   49056 non-null  float64
 22  HALKB   49070 non-null  float64
 23  ICBCT   44335 non-null  float64
 24  ISCTR   49220 non-null  float64
 25  ISDMR   12227 non-null  float64
 26  ISFIN   42876 non-null  float64
 27  ISYAT   43183 non-null  float64
 28  KAREL   46031 non-null  float64
 29  KARSN   48526 non-null  float64
 30  KCHOL   49092 non-null  float64
 31  KRDMB   47531 non-null  float64
 32  KRDMD   49160 non-null  float64
 33  MGROS   48902 non-null  float64
 34  OTKAR   48784 non-null  float64
 35  PARSN   45324 non-null  float64
 36  PETKM   49183 non-null  float64
 37  PGSUS   45220 non-null  float64
 38  PRKME   48465 non-null  float64
 39  SAHOL   49094 non-null  float64
 40  SASA    47632 non-null  float64
 41  SISE    49089 non-null  float64
 42  SKBNK   47269 non-null  float64
 43  SODA    48275 non-null  float64
 44  TCELL   49142 non-null  float64
 45  THYAO   49281 non-null  float64
 46  TKFEN   48929 non-null  float64
 47  TOASO   48945 non-null  float64
 48  TRKCM   48885 non-null  float64
 49  TSKB    48383 non-null  float64
 50  TTKOM   49076 non-null  float64
 51  TUKAS   45929 non-null  float64
 52  TUPRS   49142 non-null  float64
 53  USAK    47658 non-null  float64
 54  VAKBN   49211 non-null  float64
 55  VESTL   48780 non-null  float64
 56  YATAS   46054 non-null  float64
 57  YKBNK   49224 non-null  float64
 58  YUNSA   45527 non-null  float64
 59  ZOREN   48806 non-null  float64
dtypes: float64(60)
memory usage: 23.3 MB

There are null values as expected. These values will be handled in 2 steps as follows.

In [86]:
df_wide.fillna(method = 'bfill', inplace = True)
df_wide.fillna(method = 'ffill', inplace = True)
df_wide.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 50011 entries, 2012-09-17 06:45:00+00:00 to 2019-07-23 15:00:00+00:00
Data columns (total 60 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   AEFES   50011 non-null  float64
 1   AKBNK   50011 non-null  float64
 2   AKSA    50011 non-null  float64
 3   AKSEN   50011 non-null  float64
 4   ALARK   50011 non-null  float64
 5   ALBRK   50011 non-null  float64
 6   ANACM   50011 non-null  float64
 7   ARCLK   50011 non-null  float64
 8   ASELS   50011 non-null  float64
 9   ASUZU   50011 non-null  float64
 10  AYGAZ   50011 non-null  float64
 11  BAGFS   50011 non-null  float64
 12  BANVT   50011 non-null  float64
 13  BRISA   50011 non-null  float64
 14  CCOLA   50011 non-null  float64
 15  CEMAS   50011 non-null  float64
 16  ECILC   50011 non-null  float64
 17  EREGL   50011 non-null  float64
 18  FROTO   50011 non-null  float64
 19  GARAN   50011 non-null  float64
 20  GOODY   50011 non-null  float64
 21  GUBRF   50011 non-null  float64
 22  HALKB   50011 non-null  float64
 23  ICBCT   50011 non-null  float64
 24  ISCTR   50011 non-null  float64
 25  ISDMR   50011 non-null  float64
 26  ISFIN   50011 non-null  float64
 27  ISYAT   50011 non-null  float64
 28  KAREL   50011 non-null  float64
 29  KARSN   50011 non-null  float64
 30  KCHOL   50011 non-null  float64
 31  KRDMB   50011 non-null  float64
 32  KRDMD   50011 non-null  float64
 33  MGROS   50011 non-null  float64
 34  OTKAR   50011 non-null  float64
 35  PARSN   50011 non-null  float64
 36  PETKM   50011 non-null  float64
 37  PGSUS   50011 non-null  float64
 38  PRKME   50011 non-null  float64
 39  SAHOL   50011 non-null  float64
 40  SASA    50011 non-null  float64
 41  SISE    50011 non-null  float64
 42  SKBNK   50011 non-null  float64
 43  SODA    50011 non-null  float64
 44  TCELL   50011 non-null  float64
 45  THYAO   50011 non-null  float64
 46  TKFEN   50011 non-null  float64
 47  TOASO   50011 non-null  float64
 48  TRKCM   50011 non-null  float64
 49  TSKB    50011 non-null  float64
 50  TTKOM   50011 non-null  float64
 51  TUKAS   50011 non-null  float64
 52  TUPRS   50011 non-null  float64
 53  USAK    50011 non-null  float64
 54  VAKBN   50011 non-null  float64
 55  VESTL   50011 non-null  float64
 56  YATAS   50011 non-null  float64
 57  YKBNK   50011 non-null  float64
 58  YUNSA   50011 non-null  float64
 59  ZOREN   50011 non-null  float64
dtypes: float64(60)
memory usage: 23.3 MB
In [94]:
from seaborn import heatmap
corr = df_wide.corr()
plt.figure(figsize=(40,40))
heatmap(corr, cmap="RdBu",annot=True)
plt.show()

'SAHOL', 'MGROS', 'HALKB', 'THYAO' and 'GARAN' are selected for further analysis.

In [96]:
pd.plotting.scatter_matrix(df_wide[['SAHOL', 'MGROS', 'HALKB', 'THYAO', 'GARAN']])
Out[96]:
array([[<AxesSubplot:xlabel='SAHOL', ylabel='SAHOL'>,
        <AxesSubplot:xlabel='MGROS', ylabel='SAHOL'>,
        <AxesSubplot:xlabel='HALKB', ylabel='SAHOL'>,
        <AxesSubplot:xlabel='THYAO', ylabel='SAHOL'>,
        <AxesSubplot:xlabel='GARAN', ylabel='SAHOL'>],
       [<AxesSubplot:xlabel='SAHOL', ylabel='MGROS'>,
        <AxesSubplot:xlabel='MGROS', ylabel='MGROS'>,
        <AxesSubplot:xlabel='HALKB', ylabel='MGROS'>,
        <AxesSubplot:xlabel='THYAO', ylabel='MGROS'>,
        <AxesSubplot:xlabel='GARAN', ylabel='MGROS'>],
       [<AxesSubplot:xlabel='SAHOL', ylabel='HALKB'>,
        <AxesSubplot:xlabel='MGROS', ylabel='HALKB'>,
        <AxesSubplot:xlabel='HALKB', ylabel='HALKB'>,
        <AxesSubplot:xlabel='THYAO', ylabel='HALKB'>,
        <AxesSubplot:xlabel='GARAN', ylabel='HALKB'>],
       [<AxesSubplot:xlabel='SAHOL', ylabel='THYAO'>,
        <AxesSubplot:xlabel='MGROS', ylabel='THYAO'>,
        <AxesSubplot:xlabel='HALKB', ylabel='THYAO'>,
        <AxesSubplot:xlabel='THYAO', ylabel='THYAO'>,
        <AxesSubplot:xlabel='GARAN', ylabel='THYAO'>],
       [<AxesSubplot:xlabel='SAHOL', ylabel='GARAN'>,
        <AxesSubplot:xlabel='MGROS', ylabel='GARAN'>,
        <AxesSubplot:xlabel='HALKB', ylabel='GARAN'>,
        <AxesSubplot:xlabel='THYAO', ylabel='GARAN'>,
        <AxesSubplot:xlabel='GARAN', ylabel='GARAN'>]], dtype=object)

SAHOL has a linear relation with MGROS and GARAN.\ Between HALKB and THYAO, there seems to be a non-linear relation with a negative correlation (-55% from the previous chart).

Prices at day-close could be used for 30-day rolling correlation. Month-close would also be worth to assess. Let's arrange the data accordingly.

In [88]:
mask = df_wide.index.to_frame(index = False)
mask['date_str'] = mask['timestamp_dt'].dt.strftime('%Y-%m-%d')
mask['close'] = mask.groupby(['date_str'])['timestamp_dt'].rank(method="first", ascending=False)
mask.index = mask['timestamp_dt']
df_wide_close = df_wide[mask['close'] == 1]
In [89]:
df_wide_close['GARAN'].rolling(30).corr(df_wide_close['THYAO']).plot()
Out[89]:
<AxesSubplot:xlabel='timestamp_dt'>
In [90]:
df_wide_close['SAHOL'].rolling(30).corr(df_wide_close['GARAN']).plot()
Out[90]:
<AxesSubplot:xlabel='timestamp_dt'>
In [91]:
df_wide_close['SAHOL'].rolling(30).corr(df_wide_close['MGROS']).plot()
Out[91]:
<AxesSubplot:xlabel='timestamp_dt'>

There are unusual twists in which "sudden" negative correlations occur. These fluctuations show that there may be special occasions, that have impact on stock prices.

3. PCA¶

In [212]:
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
In [213]:
std_scaler = StandardScaler()
scaled_df = std_scaler.fit_transform(df_wide)
In [214]:
pca = PCA(n_components=3)
pca.fit_transform(scaled_df)
print(sum(pca.explained_variance_ratio_))
0.7873845938682055
In [215]:
from numpy import arange
nums = arange(16)
var_ratio = []
for num in nums:
  pca = PCA(n_components=num)
  pca.fit_transform(scaled_df)
  var_ratio.append(sum(pca.explained_variance_ratio_))
pd.DataFrame(var_ratio)
Out[215]:
0
0 0.000000
1 0.508158
2 0.680285
3 0.787385
4 0.840039
5 0.877802
6 0.900099
7 0.914928
8 0.925468
9 0.934236
10 0.942141
11 0.949946
12 0.955847
13 0.961157
14 0.965944
15 0.970018
In [216]:
plt.figure(figsize=(10,5),dpi=150)
plt.grid()
plt.plot(nums,var_ratio,marker='.')
plt.xlabel('Components')
plt.ylabel('Explained Variance')
plt.title('')
Out[216]:
Text(0.5, 1.0, '')
In [217]:
pd.DataFrame(pca.components_, columns=df_wide.columns)
Out[217]:
short_name AEFES AKBNK AKSA AKSEN ALARK ALBRK ANACM ARCLK ASELS ASUZU ... TTKOM TUKAS TUPRS USAK VAKBN VESTL YATAS YKBNK YUNSA ZOREN
0 0.007884 0.109773 0.160242 0.064050 0.136022 0.005701 0.167446 0.121985 0.173918 0.148672 ... -0.014902 0.118653 0.167456 0.127119 0.091495 0.155937 0.170247 -0.040827 0.090020 0.128302
1 -0.163204 -0.215956 -0.037141 -0.161634 -0.086260 -0.113737 0.053171 -0.037718 -0.002523 -0.093909 ... -0.267312 0.064891 0.090344 -0.146539 -0.252506 0.029910 -0.021797 -0.276557 -0.002676 -0.101452
2 -0.266361 0.073627 0.072012 -0.150636 -0.105521 -0.213442 -0.089468 0.251793 0.004175 -0.130958 ... 0.109220 0.075144 -0.008282 0.082747 0.020538 0.088376 -0.054729 -0.057006 -0.261489 0.071375
3 0.012320 -0.088794 0.081267 -0.242351 -0.127211 0.238619 -0.090194 -0.016968 -0.092453 0.081723 ... 0.116449 0.123515 -0.035836 -0.182184 -0.068728 0.127475 -0.046387 0.016955 0.191121 0.179658
4 0.018392 -0.114373 0.065223 0.222204 -0.248843 -0.086334 -0.014453 -0.139324 0.053475 0.110114 ... -0.051197 -0.277873 -0.067223 0.070254 -0.109607 -0.004283 0.009788 -0.117374 0.172259 0.177518
5 0.013732 0.090825 -0.204364 0.227431 -0.138328 0.236182 0.100423 -0.156592 -0.066873 -0.008327 ... 0.011679 0.245150 0.095994 -0.051980 0.034947 0.131766 -0.000133 0.107169 -0.045460 0.129389
6 0.046402 0.104722 -0.156250 0.075086 -0.130244 0.355938 0.031930 0.117513 -0.037452 -0.078817 ... -0.093102 -0.319589 -0.007254 -0.158582 0.104918 0.008767 -0.130671 0.079387 0.081329 0.053083
7 -0.062990 -0.059549 -0.084064 -0.064799 0.002461 0.185296 -0.049895 -0.028142 0.163627 -0.097042 ... -0.167614 -0.000576 0.152095 0.194529 -0.030297 -0.241855 -0.047618 -0.152912 -0.117002 -0.123725
8 -0.140147 -0.005320 -0.090982 -0.122588 0.126794 0.015535 -0.046909 0.008446 -0.033480 -0.033334 ... -0.099281 0.028172 -0.041034 -0.042117 -0.010994 0.231828 -0.010268 0.170796 -0.073789 0.188342
9 0.496897 0.099233 -0.103102 -0.142824 -0.057264 -0.453259 -0.044484 -0.058300 -0.010849 -0.013321 ... -0.102889 -0.123427 -0.015317 -0.169483 -0.054636 0.014707 0.026023 0.123451 0.018873 -0.085393
10 0.241188 0.041454 -0.016686 -0.103325 0.007911 0.055355 -0.040887 0.003469 0.087258 0.158994 ... 0.012170 -0.114385 -0.065119 -0.198119 0.035625 0.002418 0.276943 -0.076269 -0.327880 0.199756
11 0.397264 -0.115214 0.229817 -0.017247 -0.041410 0.240661 0.013882 0.147506 -0.030173 -0.071267 ... 0.118626 0.156449 0.046077 0.022637 0.017031 -0.086168 0.045617 -0.104000 -0.031493 -0.332306
12 0.057513 0.091477 0.014362 0.089639 0.070061 0.225447 0.049025 -0.009631 -0.024883 -0.008671 ... -0.044172 0.153855 0.022403 -0.147228 -0.052122 -0.166953 -0.096372 -0.004817 -0.140800 -0.093225
13 -0.024837 0.074048 0.031450 -0.001343 -0.079062 -0.097576 -0.087464 -0.019965 -0.065182 -0.040868 ... 0.299080 0.319338 0.068809 0.099744 -0.036524 0.050107 -0.026607 0.120002 -0.028513 -0.044312
14 0.291867 -0.177947 0.017385 0.167961 0.152875 -0.125681 0.079799 -0.157281 0.025311 0.199349 ... -0.199750 0.181231 0.028121 0.028060 -0.091075 0.100254 -0.121050 -0.045223 0.056962 -0.171290

15 rows × 60 columns

In [218]:
from numpy import max, abs
loadings = pd.DataFrame(pca.components_[0:4, :], columns=df_wide.columns)
maxPC = 1.01 * max(max(abs(loadings.loc[0:4, :])))
f, axes = plt.subplots(4, 1, figsize=(15, 10), sharex=True)
for i, ax in enumerate(axes):
    pc_loadings = loadings.loc[i, :]
    colors = ['C0' if l > 0 else 'C1' for l in pc_loadings]
    ax.axhline(color='#888888')
    pc_loadings.plot.bar(ax=ax, color=colors)
    ax.set_ylabel(f'PC{i+1}')
    ax.set_ylim(-maxPC, maxPC)
plt.tight_layout()
plt.show()

Most of the loadings of PC1 has the same sign. This component is a contrast in the direction of BAGFS, CCOLA, HALKB and SKBNK.\ As I cannot interpret the industries of stock, I couldn't conclude which industries were prevailing for each PC. Nevertheless, as far as I can google stocks, I may claim that PC2 is driven by banking sector, and PC4 has the movement opposite to electiricty/energy market.\ Another perspective may be the shareholder structure of the companies. Domestic capital, foreign capital and/or joint capital would add a value to analysis in order to reveal latent patterns.

4. VS. Google Trends¶

I have selected a new set as GARAN, THYAO, ASELS, TTKOM to analyse with Google data.\ In addition to search volume of stocks, I would like to get Turkish Lira and TCMB search volumes over time. As I expect some correlation.\ Further, I'll get the search volume of the term of "war", "hisse yatirim" and "ekonomik kriz".

In [219]:
gtrends = pd.read_csv('gtrends.csv', sep = ';')
gtrends.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 82 entries, 0 to 81
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype
---  ------         --------------  -----
 0   yearmonth      82 non-null     int64
 1   TRY            82 non-null     int64
 2   GARAN          82 non-null     int64
 3   THYAO          82 non-null     int64
 4   ASELS          82 non-null     int64
 5   TTKOM          82 non-null     int64
 6   WAR            82 non-null     int64
 7   TCMB           82 non-null     int64
 8   HISSE_YATIRIM  82 non-null     int64
 9   ECO_CRISIS     82 non-null     int64
dtypes: int64(10)
memory usage: 6.5 KB
In [220]:
df_gt = df_wide[['GARAN', 'THYAO', 'ASELS', 'TTKOM']].reset_index(drop = True).copy()
df_gt['yearmonth'] = df_wide.index.to_frame(index = False)['timestamp_dt'].dt.year*100+df_wide.index.to_frame(index = False)['timestamp_dt'].dt.month
print(df_gt.info())
print(df_gt.head())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50011 entries, 0 to 50010
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   GARAN      50011 non-null  float64
 1   THYAO      50011 non-null  float64
 2   ASELS      50011 non-null  float64
 3   TTKOM      50011 non-null  float64
 4   yearmonth  50011 non-null  int32  
dtypes: float64(4), int32(1)
memory usage: 1.7 MB
None
short_name   GARAN   THYAO   ASELS   TTKOM  yearmonth
0           6.3715  3.3661  2.9948  4.2639     201209
1           6.3386  3.3574  2.9948  4.2521     201209
2           6.3386  3.3661  2.9855  4.2521     201209
3           6.3715  3.3748  2.9855  4.2521     201209
4           6.3715  3.3748  2.9760  4.2521     201209
In [221]:
df_gt_agg = df_gt.groupby('yearmonth')[['GARAN', 'THYAO', 'ASELS', 'TTKOM']].mean()
df_gt_comp = df_gt_agg.merge(gtrends, on = 'yearmonth', how = 'left', suffixes = ['_price', '_search'])
df_gt_comp.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 83 entries, 0 to 82
Data columns (total 14 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   yearmonth      83 non-null     int32  
 1   GARAN_price    83 non-null     float64
 2   THYAO_price    83 non-null     float64
 3   ASELS_price    83 non-null     float64
 4   TTKOM_price    83 non-null     float64
 5   TRY            82 non-null     float64
 6   GARAN_search   82 non-null     float64
 7   THYAO_search   82 non-null     float64
 8   ASELS_search   82 non-null     float64
 9   TTKOM_search   82 non-null     float64
 10  WAR            82 non-null     float64
 11  TCMB           82 non-null     float64
 12  HISSE_YATIRIM  82 non-null     float64
 13  ECO_CRISIS     82 non-null     float64
dtypes: float64(13), int32(1)
memory usage: 8.9 KB
In [222]:
df_gt_comp.index = df_gt_comp['yearmonth']
df_gt_comp.drop(columns = ['yearmonth'], inplace = True)
In [223]:
df_gt_comp.corr()
Out[223]:
GARAN_price THYAO_price ASELS_price TTKOM_price TRY GARAN_search THYAO_search ASELS_search TTKOM_search WAR TCMB HISSE_YATIRIM ECO_CRISIS
GARAN_price 1.000000 0.487095 0.681801 0.440804 0.189476 -0.591768 0.518645 0.501831 -0.213711 -0.073987 0.440271 0.511696 -0.013251
THYAO_price 0.487095 1.000000 0.776196 -0.266401 0.708602 -0.601653 0.339088 0.762021 0.225743 0.012838 0.661255 0.579338 0.572691
ASELS_price 0.681801 0.776196 1.000000 -0.122877 0.687123 -0.904185 0.711621 0.897638 0.092118 -0.113522 0.827285 0.722440 0.488478
TTKOM_price 0.440804 -0.266401 -0.122877 1.000000 -0.544373 0.169204 -0.123347 -0.295553 -0.467888 0.132264 -0.408956 -0.164994 -0.524568
TRY 0.189476 0.708602 0.687123 -0.544373 1.000000 -0.649617 0.451095 0.696845 0.189140 -0.062534 0.782933 0.506736 0.886001
GARAN_search -0.591768 -0.601653 -0.904185 0.169204 -0.649617 1.000000 -0.806698 -0.753180 -0.035518 0.134208 -0.842710 -0.686532 -0.484149
THYAO_search 0.518645 0.339088 0.711621 -0.123347 0.451095 -0.806698 1.000000 0.542887 0.058637 -0.186357 0.610799 0.620097 0.196061
ASELS_search 0.501831 0.762021 0.897638 -0.295553 0.696845 -0.753180 0.542887 1.000000 0.127954 -0.038641 0.804261 0.622576 0.549421
TTKOM_search -0.213711 0.225743 0.092118 -0.467888 0.189140 -0.035518 0.058637 0.127954 1.000000 -0.066593 0.205423 0.074203 0.159880
WAR -0.073987 0.012838 -0.113522 0.132264 -0.062534 0.134208 -0.186357 -0.038641 -0.066593 1.000000 -0.136062 0.034693 0.081840
TCMB 0.440271 0.661255 0.827285 -0.408956 0.782933 -0.842710 0.610799 0.804261 0.205423 -0.136062 1.000000 0.613644 0.670010
HISSE_YATIRIM 0.511696 0.579338 0.722440 -0.164994 0.506736 -0.686532 0.620097 0.622576 0.074203 0.034693 0.613644 1.000000 0.363834
ECO_CRISIS -0.013251 0.572691 0.488478 -0.524568 0.886001 -0.484149 0.196061 0.549421 0.159880 0.081840 0.670010 0.363834 1.000000

The strongest correlation is between ASELS price and its search volume.\ The correlation with "hisse yatirim" is ASELS as it may imply ASELS took the attention of "investors" more than other selected stocks in the table.\ Search for "war" has weak correlations with stock prices.